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(54) A splH-SMP computer system 

(57) A computer system includes multiple local 
buses (16A.16B) to which processors (20A.20E,20D) 
and other devices may be connected. A repeater 
(14A.UB) is coupled to each of the local buses. Addi- 
tionally, a top level repeater (12) is coupled to each of 
the repeaters (14A,14B). The repeaters (14A.14B) 
transmit transactions from the con-esponding local 
buses (16A.16B) to the top repeater (17). The top 
repeater (12), based upon the local or global nature of 
the trEUTsaction, transmits the transaction to one or more 
of the repeaters (14A,14B). The repeaters (14A.14B) 
receiving the transaction then transmit the transaction 
upon the local buses (16A,16B) attached thereto. If the 
transaction is a local transaction, the top repeater (12) 
transmits the transaction to those repeaters (14A.14B) 
which are configured into a local domain with the 
repeater (14A.14B) which detected the initial transac- 
tion. The local domain comprises one or wore repeaters 
(14A.14B) which are logically interconnected. The local 
buses (16A,16B) attached thereto logically form one 
BMP bus to which devices may be attached. Alterna- 
tively the transaction may be a global transaction. The 
top repeater (12) transmits the global transaction to all 
repeaters (14A,14B) in the system. Subsequently, the 
transaction is retransmitted upon all of the local buses 
(16A,16B). In one embodiment a transaction is deter- 
mined to be local or global based upon the address par- 
tition containing the address. The address space of the 
computer system is divided into multiple address parti- 
tions. Each partition is defined to be either local or glo- 
bal, and additional properties are defined for each 
partition. 
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Description 

This inv^on is related to the field of symmetrical 
muttiprocessing systems and. more particularly, to a 
symmetrica! multiprocessirtg system including a hierar- 
chical architecture. 

Multiprocessing computer systems indude two or 
more processors which may be employed to perform 
computing tasks. A particular computing task may be 
performed upon one processor while other processors 
perform unrelated computing tasks. Alternatively, com- 
ponents of a particular computing task may be distrib- 
uted among multiple processors to decrease the time 
required to perform the computing task as a whole. 
Generally speaking, a processor is a device configured 
to perform an operation upon one or more operands to 
produce a result. The operation is performed in 
response to an insfruction executed by the processor. 

A popular architecture in commercial multiprocess- 
ing computer systems is tiie symmeti-ic multiprocessor 
(SMP) architecture. Typically, an SMP computer system 
comprises multiple processors connected ttirough a 
cache hierarchy to a shared bus. Additionally connected 
to the bus is a menrwry, which is shared among the 
processors in ttie system. Access to any particular 
memory location within the memory occurs in a similar 
amount of time as access to any other particular mem- 
ory location. Since each location in the memory may be 
accessed in a uniform manner, this sti-ucture is often 
refened to as a uniform memory architecture (UMA). 

Processors are often configured with internal 
caches, and one or more caches are typically included 
in the cache hierarchy between ttie processors and the 
shared bus in an SMP computer system. Multiple cop- 
ies of data residing at a particular main memory 
address may be stored in these caches. In order to 
maintain the shared memory model, in which a particu- 
lar address stores exactly one data value at any given 
time, shared bus computer systems employ cache 
coherency. Generally speaking, an operation is coher- 
ent if the effects of the operation upon data stored at a 
particular memory address are reflected in each copy of 
the data wittiin the cache hierarchy. For example, when 
data stored at a particular memory address is updated, 
the update may be supplied to the caches which are 
storing copies of the previous data. Alternatively the 
copies of the previous data may be invalidated in the 
caches such that a subsequent access to the particular 
memory address causes the updated copy to be trans- 
ferred from main memory. For shared bus systems, a 
snoop bus protocol is typically employed. Each coher- 
ent transaction performed upon the shared bus is exam- 
ined (or "snooped"*) against data in the caches. If a copy 
of the affected data is found, the state of the cache line 
containing tiie data may be updated in response to the 
coherent transaction. 

Unfortunately, shared bus architectures suffer from 
several drawbacks which limit their usefulness in multi- 



processing computer systems. A bus is capable of a 
peak bandwidth (e.g. a number of bytes/second which 
may l>e transferred across the bus). As additional proc- 
essors are attached to the bus, the bandwidth required 
5 to supply the processors with data and instructions may 
exceed the peak bus bandwidth. Since some proces- 
sors are forced to wait for available bus barxtwidth. per- 
formance of the computer system suffers when the 
bandwidth requirements of the processors exceeds 
70 available bus bandwidth. 

Additionally, adding more processors to a shared 
bus increases the capacitive loading on the bus and 
may even cause the physical length of the bus to be 
increased. The increased capacitive loadir^ and 
15 extended bus length increases the delay in propagating 
a sigial aaoss tiie bus. Due to the increased propaga- 
tion delay, transactions may take longer to perform. 
Therefore, the peak bandwidth of the bus may decrease 
as more processors are added. 
20 These problenre are further magnified by the con- 
tinued increase in operating frequency and perform- 
ance of processors. The increased performance 
enabled by the higher frequencies and more advanced 
processor microarchitectures results in higher band- 
25 widtii requirements than previous processor genera- 
tions even for the same number of processors. 
Therefore, buses whidi previously provided sufficient 
bandwidth for a multiprocessing computer system may 
be insufficient for a similar computer system employing 
30 tiie higher performance processors. 

Particular and prefened aspects of the invention 
are set out in the accompanying independent and 
dependent claims. Features of the dependent claims 
may be combined with those of ttie independent claims 
35 as appropriate and in combinations other tiian those 
explidtiy set out in the claims. 

The problems outlined above are in large part 
solved by a computer system in accordance with the 
present invention. The computer system includes multi- 
40 pie local buses to which processors and other devices 
may be connected. A repeater is coupled to each of tiie 
local buses. Additionally, a top level repeater is coupled 
to each of tfie repeaters. The repeaters transmit trans- 
actions from the corresponding local buses to tiie top 
45 repeater. The top repeater, based upon the local or glo- 
bal nature of the transaction, transmits the transaction 
to one or more of the repeaters. The repeaters receiving 
ttie transaction then transmit tiie tiansaction upon tiie 
local buses attached thereto. 
so If the fansaction is a local transaction, ttie top 
repeater transmits the transaction to those repeaters 
which are configured into a local domain with tiie 
repeater which detected the initial ti-ansaction. The local 
domain comprises one or more repeaters which are log- 
55 ically interconnected. The local buses attached thereto 
logically form one SMP bus to which devices may be 
attached. 

Alternatively, the transaction may be a global trans- 
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action. The top repeater transmits the global transaction 
to all repeaters in the system. Subsequently, the trans- 
action is retransmitted upon all of the local buses. In one 
embodiment, a transaction is determined to be local or 
global based upon the address partition containing the s 
address. The address space of the computer system is 
divided into n^ltiple address partitions. Each partition is 
defined to be either local or global, and additional prop- 
erties are defined for each partition. 

Oth^ objects and advantages of the invention will io 
become apparent upon reading the following detailed 
desaiption and upon reference to the accomparrying 
drawings in which: 

Fig. 1 is a block diagram of one embodiment of a is 
split-SMP computer system. 

Fig. 2 is a more detailed block diagram of a pair of 
repeaters according to one embodiment of the split- 
SMP computer system shown in Fig. 1 . 20 

Fig. 3 is a diagram depicting the physical memory 
included in the split-SMP computer system shown 
in Rg. 1 . 

25 

Fig. 4 is a diagram depicting the address space of 
the computer system shown in Fig. 1. highlighting 
certain address partitions within the address space. 

Fig. 5 is a flowchart depicting operation of an oper- 30 
ating system upon the computer system shown in 
Fig. 1 according to one embodiment of the present 
invention. 

Fig. 6 is a flowchart depicting operation of a 35 
repeater in the conputer system shown in Fig. 1 
according to one emtxxliment of the present inven- 
tion. 

While the invention is susceptible to various modifi- 40 
cations and alternative forms, specific embodiments 
thereof are shown by way of example in the drawings 
and will herein be described in detail. It should be 
understood, however, that the drawings and detailed 
desaiption thereto are not intended to limit tiie invention 45 
to the particular form disclosed, but on the contrary, the 
intention is to cover all modifications, equivalents and 
alternatives failing within ttie scope of the present inven- 
tion. 

Turning now to Fig. 1, a block diagram of one so 
embodiment of a split-SMP computer system 10 is 
shown. As shown in Fig. 1. computer system 10 
includes a top repeater 12. repeaters 14A-14D. proces- 
sors PVP16 and memories M1-M4, Processors P1-P4 
andmemoryMI are coupled to a local bus 16A to which 55 
repeater 14A is coupled. Similarly, other processors PS- 
PIS and memories M2-M4 are coupled to local buses 
16B-16D. to which respective repeaters 14B-14D are 



coupled as shown in Rg. 1. Each repeater 14 has a 
point to point connection with top repeater 12. Bements 
referred to by a reference number foOowed by a letter 
win be coliectrvely referred to herein by the reference 
number alone. For example, repeaters 14A-14D are col- 
lectively referred to as repeaters 14. tt is noted ttiat the 
numbers of various elements as shown and connected 
in Rg. 1 are exemplary only: any number of various ele- 
ments may be included in alternative configurations. 

Repeaters 14A and 14B are logically intercon- 
nected with each otiier via top repeater 12. In other 
words, top repeater 12 routes transactions from 
repeater 14A to repeater 148 and vice-versa Similarly, 
repeaters 14C-14D are logically interconnected via top 
repeater 12. Generally speaking, the bgically intercon- 
nected repeaters 14A-14B and 14C-14D each form an 
SMP node by logically combining the local buses 16 
coupled thereto into one SMP bus For example, local 
buses 16A and 16B are logically combined into a single 
SMP bus. A transaction initiated upon one of local 
buses 16A-16B is transmitted by the repeater 14A-14D 
coupled thereto to top repeater 12. Top repeater 12 for- 
wards the transactbn to the other repeater 14A-14B. 
The repeater 14A-14B receiving tfie transmitted trans- 
action conveys the transaction upon the respective local 
bus 16A-16B. Furthermore, the devices attached to the 
local bus upon which tiie transaction is initiated do not 
recognize the transaction (i.e. snoop their caches for 
cache coherence, provide data from the memory, etc.) 
until the retransmitting repeater is prepared to retrans- 
mit tiie transaction. In tiiis manner, tfie devices attached 
to ttie logically interconnected repeaters receive a 
transaction substantially simultaneously. Logically, 
ttierefore. the devices are attached to the same local 
bus despite the physical disconnection between the 
local buses. In one embodiment, the repeater for the 
local bus upon which tiie ti-ansaction is initiated does 
not retransmit ttie transaction upon that local bus. 
Instead, a signal is asserted to the devices attached to 
that local bus to process the transaction The devices 
attached thereto maintain a queue of transactions which 
were initiated locally, such that the transactions may be 
processed upon receipt of the asserted signal. Addi- 
tional details will be provided further below. 

The repeaters which are logically interconnected 
may be considered to be a "local donrain". The devices 
within the local domain are involved in every transaction 
within the local domain. For example, each device within 
tiie local domain snoops the coherent transactions per- 
formed within ttie local dentin. The devices not 
included in the local domain are only involved in global 
transactions which are initiated from the local domain. 
Because tiie local domains are independent, the band- 
widtii of the system may be larger than the bandwidth of 
a system in which all transactions are global. The higher 
bandwidth may provide for improved performance of the 
computer system. It is noted that the embodiment of 
computer system 10 shown in Fig. 1 includes two local 
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domains. The tocal domains are separated by a vertical 
dashed line 18. 

In addition to routing transactions between the 
repeaters 14 comprisng a local domain, top repeater 12 
detects transacti<»is which are indicated to be global 
transactions. A global transaction is one which may 
access memory outside the local domain in which the 
transaction is initiated. Furthemiae. global transactions 
Include transactions which may require transmittal to a 
local bus outside of the local domain for coherency pur- 
poses. In the present embodiment, certain address par- 
titions are defined as detailed further below. Rom the 
address partition containing the address presented for a 
transactiOT, top repeater 12 may determine if the trans- 
action is global or local. H tiie transaction is global, then 
top repeater 12 transmits the transaction to each of the 
repeaters 14 coupled thereto regardless of the local 
domain to which the repeaters 14 belong. Similar to 
local bansactions, the repeaters 14 transmit tiie trans- 
action upon the respective local buses 16 substantially 
simultaneously In this manner, computer system 10 as 
shown in Fig. 1 operates logically as a two level hierar- 
chy comprising two logical local buses (i.a local buses 
16A and 16B and local buses 16C and 16D), and one 
global interconnection (top repeater 12). 

Turning now to Fig. 2. a block diagram depicting 
repeaters 14A. 14B, and devices coupled tiiereto is 
shown. Repeater 14A. local bus 16A, and bus devices 
20A and 20B are shown as a node 30A (indicated by a 
dashed enclosure). Similarly, repeater 14B. local bus 
16B. and bus devices 20C and 20D form a node 30B 
(indicated by a dashed enclosure). Repeaters 14A and 
14B are interconnected by an upper level bus 22. 
Although described witti respect to Fig. 2 as an upper 
level bus 22 for simplicity, the interconnection between 
the r^eaters 14 may comprise any type of interconnec- 
tion. More particularly, the point to point interconnection 
of Fig. 1 may be employed, in which upper level bus 22 
is actually two point to point connections beween the 
repeaters 14 and top repeater 12 (shown as a dashed 
enclosure upon Fig. 2). 

Bus device 20 A is a processor device and includes 
incoming queue 40A, multiplexor 42A. in addition to the 
processor element 48A. The processor element 48A 
may include a high perfonrance processor and a high 
speed cache memory. 

Bus device 20B is an input/output (I/O) bus device 
Similar to processor device 20A. UO bus device 20B 
includes an incoming queue 40B and a multiplexor 42B 
in addition to I/O element 50. I/O element 50 may 
include a bus bridge to a peripheral bus. such as tiie 
Peripheral Component Interconnect (PCI) bus. The PCI 
bus may be used to interface to peripheral devices such 
as a graphics interface, serial and parallel ports, disk 
drives, modems, printers, etc. While the embodiment m 
Fig 2 shows only two bus devices 20 in eadn node 30, 
the number of bus devices 20 may be greater or smaller 
depending upon the desired configuration. Also, any 



mixture of processor devices and I/O devices may be 
present. 

Generally speaking, bus devices 20 communicate 
with each other by sending and receiving bus transac- 
5 tiona Bus transactions may perform either memory or 
I/O operations. Generally, a memcyy operation is an 
operation caising transfer of data from a source to a 
destination. The source and/or destination may be stor- 
age locations within the initiator, or may be storage loca- 
10 tions within system memory. When a source or 
destination is a storage location within system memory, 
the souce or destination is specified via an address 
conveyed with the memory operation. Memory opera- 
tions may be read or write operations. A read operation 
IS causes transfer of data from a source outside of the ini- 
tiator to a destination wititin the initiator. Conversely, a 
write operation causes transfer of data from a source 
within the initiator to a destination outside of tiie initiator. 
In Fig. 2. a memory operation may include one or more 
20 transactions upon the buses 16 and bus 22. Bl^ trans- 
actions are broadcast as bit-encoded packets compris- 
ing an address, comnr^and. and source id. Other 
information may also be encoded in each packet such 
as addressing modes or mask information. 
25 I/O Operations are similar to memory operations 
except the destination is an 1/0 bus device. I/O devices 
are used to communicate witii peripheral devices, such 
as serial ports or a floppy disk drive. For example, an I/O 
read operation may cause a transfer of data from I/O 
30 element 50 to a processor in processor bus device 20D. 
Similarly, an I/O write operation may cause a transfer of 
data from a processor in bus device 20D to tiie I/O ele- 
ment 50 in bus device 20B. In Fig. 2. an I/O operation 
may include one or more transactions upon the buses 
35 16 and bus 22. 

The architecture shown in Fig. 2 may be better 
understood by tracing the flow of typical bus transac- 
tions^ For example, a bus transaction initiated by proc- 
essor element 48 of bus device 20A is issued on 
40 outgoing interconnect path 44A. The transaction is seen 
as outgoing packet P1(o) on local bus 16A. Each bus 
device connected to local bus 16A. indudng the initiat- 
ing bus device (20A in this example), stores tiie outgo- 
ing packet P1(o) in its incoming queue 40. Also. 
45 repeater 14A broadcasts the packet P1(o) onto the bus 
22 where it appears as packet PI. The repeaters in 
each of the non-originating nodes 30 receive the packet 
PI and drive rt as an incoming packet P1(i) on their 
respective local buses 16. Since tine embodiment illus- 
50 trated in Fig. 2 only shows two nodes 30. repeater 14B 
would receive packet PI on the bus 22 and drive it as 
incoming packet P1(i) on local bus 16B. in the above 
example. It is important to note that repeater 1 4A on the 
node 30A from which the packet PI originated as outgo- 
55 ing packet PI (o). does not drive packet P1 back down to 
local bus 16A as an incoming packet. Instead, when the 
other repeaters, such as repeater 14B, drive packet PI 
on their respective local buses, repeats 14A asserts 
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incoming signal. 36A, Incoming signal 36A alerts each 
bus device in the originating node to treat the packet 
stored in its incoming queue 40 as the current incoming 
pacKet. The repeater 14B in ncn-crigina«ng node 30B 
does not assert its incoming signal 36B. Thus devices s 
20C and 20D bypass their incoming queues 40 and 
receive the incoming packet P1(i) from local bus 168. 
Multiplexors 42 are responsive to the incoming signal 
and allow each device to see eHher the packet on the 
local bus 16 or the packet at the head of incoming w 
queue 40 as the current transaction packet. 

In the above example, storing the outgoing packet 
Pl(o) in the inconting queues 40 of bus devices 20 in 
the originating node 30A frees up the local bus 16A to 
broadcast another outgoing packet while the first packet is 
is being broadcast on the local bus 16 of the other non- 
originating repeater nodes (308) and is being presented 
from the incoming queues 40 in the originating node 
30A. Thus, the same bus transaction never appears 
more than a single time on any given bus. thereby allow- so 
ing maximum utilization of the bus bandwidth. 

Generally speaking, each device on a given local 
bus 16 stores the outgoing transaction packets that 
appear on that bus in their incoming queues 40. The 
repeater 1 4 for that repeater node broadcasts all outgo- 25 
ing transaction packets to the bus 22 in the same order 
they appear on the originating local bus 16. The 
repeater for each repeater node 30 drives packets from 
the bus 22 on to its local bus 16 as incoming packets 
only if the packet did not originate from that node 30. If 30 
the packet originated from a particular node 30 (the 
originating node), then that node asserts the incoming 
signal 36 instead of re-driving the packet during the bus 
cycle that the other repeaters 14 are driving the packet 
as an incoming packet. Thus all bus devices 20 see the 35 
transaction at the same time. The devices 20 in the ong- 
inating node see the packet from their incoming queues 
40 and devices 20 in non-originating nodes see the 
packet on their-local bus 16 via their respective bypass 
paths 46 (e.g. bypass path 46A in bus device 20A). 4o 
Since bus devices 20 in the originating node use their 
respective incoming queues 40 to view the packet, the 
local bus 16 in the originating node is free to broadcast 
another outgoing packet In this manner, the full band- 
width of the bus 22 may be ?:riized. 

Since outgoing transaciion packets are broadcast 
in the same order as issued (allowing for arbitration 
between devices) and appear at each device during the 
same bus cycle, the hierarchical bus structure of the 
local buses 1 6, repeaters 1 4. and bus 22 appears as a so 
single large logically shared bus to the bus devices 20. 
However, many more bus devices 20 may be supported 
by the hierarchical structure of Fig. 2. than wouW be 
allowable on a single physically shared bus. In one 
embodiment, the memory physically located in each ss 
node 30 (not shown) collectively appears as a single 
logical memory forming the system memory. The sys- 
tem memory may generally be accessed by any bus 



device 20. 

Turning now to Fig. 3. a diagram depicting afloca- 
tion of the physical memory, or system memory, among 
the nodes of the multiprocessor system is shown. In one 
embodiment, the physical memory is equally divided 
among the processing nodes. Therefore, each of n 
processing nodes holds 1/n of the total physical mem- 
ory locations. As illustrated in Fig. 3. in a four node mul- 
tiprocessing system, physical memory 60 is divided into 
four local memories (Ml through M4). It is noted that a 
multiprocessing system could allocate the memory in 
different proportions between the nodes. More particu- 
larly, a first node may include a first amount of memory, 
a second node may indude a second amount of mem- 
ory dissimilar from the first amount, etc. 

Referring now to Fig. 4. each memory location is 
mapped to multiple locations within an address space 
70. Address space 70 is comprised of multiple address 
partitions. Each physical memory location can be 
accessed using a plurality of address aliases (i.e.. one 
from each partition). For example, a location 80 may be 
mapped to a location 82A within SS space 72. a location 
828 in LS space 828. a location 82C within RR space 
76. and a location 82D within RS space 78. 

In one embodiment, address space 70 incliKles 
four address partitions: SMP-space 72 (SS). local- 
space 74 (LS). remote read space 76 (RR). and remote 
space 78 (RS). Each address partition is assigned prop- 
erties which repeater 12 uses to control the transfer of 
data in the hierarchical structure. The properties of each 
address partition are discussed in more detail below. 

SS address partition 72 is the global address parti- 
tion. Address aliases in this address partition are broad- 
cast globally to all repeaters 14. Therefore, accesses to 
a physical memory location not within a particular local 
domain should use an SS address alias to access those 
memory locations. In addition, accesses to a memory 
location within the local domain but designated as glo- 
bal memay should use an SS address alias. Local 
memory may be designated as global memory if a proc- 
ess in a different local domain requires access to that 
memory. 

LS address partHion 74 is the local address parti- 
tion. An address alias from LS address partition 74 may 
only be used to access the portion of memory that is 
allocated to that local domain. An access using an LS 
address aKas to a physical memory location not within 
the local domain causes a trap when doing a page table 
walk or TLB access. In the present embodiment the 
operating system maintains a per processor or per node 
page table structure. A processor may only access 
translations stored in the processor's page table struc- 
ture (or the page table stmcture of the node containing 
the processor). The trap occurs due to the fact that the 
translation does not exist within the page table structure 
of the initiating processor. . ^ ^ , 

RR address partition 76 is used to read data from 
remote addresses. In one embodiment, processes run- 
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ning on a node way use RR address partition 76 to per- 
form a read-stream transaction from a renwte address. 
Read-stream is a transaction performed in response to 
execution of a block read instruction d^ined by the 
SPARC architecture to transfer data without caching the 5 
data. Similarly, RS address partition 78 is used to per- 
form read-stream and write-stream transactions to 
remote meniory. Write stream is a transaction per- 
fomted in r^ponse to a write-Wock instruction defined 
by the SPARC architecture. 10 

The address partitions are used to restrict and con- 
trol the flow of data within computer system 10. Top 
r^ater 12 decides whether to broadcast a transaction 
to all repealers 14 or to limit it to a local domain of 
repeaters based on the ^ress partition of the transac- 75 
tion. For example, if a node 30 attempted to address 
data stored in a memory location allocated to another 
node 30, an SS address alias should be used to access 
the data. When repeater 12 receives a transaction with 
an SS address alias, it broadcasts the transaction to so 
each repeater 14. In contrast, if a node 30 attempts to 
access a menriory location within the local domain which 
is not shared with other nodes outside the local domain, 
an LS address alias should be used. When repeater 12 
receives an LS address alias, it does not broadcast the 25 
transaction to non-local repeaters 14. 

An issue arises when a process migrates from one 
node in computer system 10 to another node. Process 
migration occurs when a process originally assigned to 
one node is suspended and later reassigned to another 30 
node. Memory that was local to the process when origi- 
nally assigned may no longer be a local. For example, If 
a process Is originally assigned to a first node 30 and 
later reassigned to a second node 30 which is not logi- 
cally local with the first node 30. what was originally 35 
local memory to the process is now remote (i.e. allo- 
cated to a different local domain). The process, how- 
ever, may not be aware that the memory location is no 
longer local. If the process attempts to access a mem- 
ory location local to first node 30 using an LS address 40 
alias, a trap will occur The trap occurs because the 
translation for the virtual address con-esponding to the 
LS address alias is not contained within the page table 
stnicture accessed by the second node 30. A trap trans- 
fers control to the operating system. The operating sys- 45 
tern moves the data that was attempted to be accessed 
from first node 30 to second node 30 using RR 76 
address aliases and local (LS) writes. The memory can 
then be accessed using an LS address alias. The use of 
a hierarchical affinity scheduler, which reduces the so 
migration of processes from one node to another, can 
minimize the occurrence of moving data blocks from 
one node to another. 

An alternative to copying the data from one node to 
another is to change the translation of the address from 5i 
lcx:al to global. For example, the following process may 
be used: 



(1 ) Invalidate the local translation in all local transla- 
tion tables: 

(2) Invalidate the translations in the TLBs {e.g. per- 
form a TLB shootdc-vn): 

(3) Flush all cache lines within the page for all proc- 
essors in the local node: and 

(4) Create a new. global translation for the page. 

Turning next to Fig. 5, a flow diagram depicting a 
portion of the operation of an operating system in 
accordance with one embodiment of the present inven- 
tion is shown. The portion shown in Rg. 5 depicts the 
activities performed when a page of m»nory is allo- 
cated to a process. A page may be allocated via an 
expiidt request by the process bang executed. Afterna- 
tively, a certain nunnber of pages may be automatically 
allocated upon initiation of a process. 

During a step 90, the operating system selects a 
page for allocation to the process. Generally, the operat- 
ing system maintains a list of "^ree" pages (i e- those 
pages which are not currently allocated to a process). 
One of the free pages is allocated to the process. If no 
pages are free, the operating system selects a currently 
allocated page, deallocates the page from the process 
to which it was allocated (including saving the data 
within the page to disk and invalidating the translation 
for the page), and allocates the page to the requesting 
process. Many algorithms are well known for selecting 
allocated pages for reallocation to a new process, gen- 
erally known as demand-paged algorithms. 

Upon selection of a page to allocate to the process, 
the operating system determines if the page should be 
allocated as local or global (step 92). A variety of algo- 
rithms may be used to select local versus global. As 
desaibed in detail below, one scheme involves allocat- 
ing pages as local initially, then changing the allocation 
to global upon occurrence of a trap during an attempt to 
access the page. Another scheme involves allocating 
pages as global initially, and later determining which 
pages to change to local based upon usage of the page 
by various nodes. It is noted that any suitable scheme 
may be enployed. 

If a page is determined to be global, then (as illus- 
trated in a step 94) the operating system creates a glo- 
bal address translation (i.e. a translation to an address 
within SS space 72. RR space 76, or RS space 78). If 
the page table structure employed by the computer sys- 
tem is such that each processor or each node has its 
own page table structure, the translation is placed into 
an page table structures. Alternatively, the operating 
system may determine that the page should be local. As 
illustrated in a step 96. the operating system creates a 
local translation available only within the local domain. 
In the exemplary page table structure described above, 
the translation is placed only in the page table structure 
of the node containing the menwry. If processors in 
other nodes attempt to access the address, no transla- 
tion will be found in tiieir page table structures and a 
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trap will occur. 

Turning next to Fig. 6. a flow cfiagram depicting 
operation of top repeater 12 in accordance with one 
embodiment of the present invention is shown. During a 
step 1 00. a repeater 1 4 detects an address of a transac- s 
tion being presented upon the con-e^ndlng local t>us 
16. The repeater 14 transmits the address to top 
repeater 12. As iOustrated in step 102. top repeater 12 
determines if the address is local or gld)al by determin- 
ing which of the address partitions depicted in Fig. 4 io 
contains the address. 

If the address is within a local address partition 
(ag. LS address partition 74 tor the address space 
depicted in Rg. 4) then top repeater 12 does not trans- 
mit the transaction to repeaters 14 outside of the local is 
domain from which the address emanates. Instead, 
repeaters 14 within the local domain receive the trans- 
action and those outside the domain do not As illus- 
trated in step 104. the transaction completes within the 
local domain. Alternatively, the address may be within a so 
global address partition (ag. SS address partition 72, 
RR address partition 76, or RS address partition 78 for 
the address space depicted in Fig. 4). As illustrated in 
step 106. top repeater 12 broadcasts the transactions to 
ail other nodes. The transaction subsequently com- 25 
pletes based upon responses from all nodes, not just 
from the local node (step 1 08). 

It is advantageous to modify the designation of 
memory as local or global. For example, if two proc- 
esses are both accessing a page of data designated as 30 
local memory, that page is being moved from one node 
to another using RR address aliases each time a differ- 
ent process accesses it. In this case, it would be advan- 
tageous to designate that page of data as global. In 
addition, a page accessed by one process and only 35 
designated as global memay unnecessarily wastes 
bandwidth. Because the number of processes access- 
ing a memory block changes, a method of dynamically 
changing the designation of memory Wocks is desirable. 

Several algorithms can be used for dynamically 4o 
changing the designation of memory. In one embodi- 
ment, all pages of memory are originally designated as 
local and a counter keeps track of how many times a 
page is moved due to improper accesses using LS 
address aliases. When a threshold has been reached. 45 
the page is converted to global and no more Wock 
moves are required. In another embodiment each page 
is started off with a global designation. Pages are indi- 
vidually changed to local one at a time. Pages are then 
converted back to global using the algorithm discussed so 
above. In yet another alternative, pages may be initially 
set to local and change to global as traps occur. 

In one specific embodiment; address partitions are 
used to prevent a software bug in one-a node from cor- 
rupting data in another node. In this embodiment, only ss 
LS 306 and RR 308 address partitions are employed. 
This is performed by configuring the repeaters 14 such 
that only RR address aliases are broadcast Each node 



runs its own kernel of the operating system. A kernel is 
a portion of the operating system encompassing a 
nucleus of the basic operating system functions. Each 
kernel is resident in the local memory of that node and 
is designated as local memory space. Therefore, nodes 
can only access the kernel of another node using read- 
only instructions. If a process on one node attempts to 
access the kernel of-another node, the repeater will not 
broadcast the data request. The kernel can only be 
accessed from remote nodes using RR (read-only) 
address aliases. In this manner, a software bug running 
on one node cannot crash the kernel or any applications 
running on different nodes. 

Any communications between nodes is performed 
in a poll based manner. Each node designates a mem- 
ory location to store status bits indk:ating that the node 
has data for a process running on another node. The 
other processes periodically poll these status bits using 
RR address aliases, which are read only. When a proc- 
ess detects that another node has data for that process, 
the data is read using RR aliases. In this manner, data 
is transferred between nodes without any node having 
write access to another node. Therefore, corrupted soft- 
ware in one node is unable to write data to other nodes 
in the hierarchical bus. and corrupted software on one 
node is unable to conrupt software in other nodes. Alter- 
natively, global intenrupts may be supported between 
the processors instead of the poll-based scheme. 

Although the present invention has been described 
in connection with the desaibed embodiments, it is not 
intended to be limited to the specific form set forth 
herein, but on the contrary, it is intended to cover such 
alternatives, modifications, and equivalents, as can be 
reasonably included within the spirit and scope of the 
invention as defined by the appended claims. 

Numerous variations and modifications will become 
apparent to those skilled in the art once the above dis- 
closure is fully appreciated. It is intended that the follow- 
ing claims be interpreted to embrace all such variations 
and modifications. 



Claims 

1 . A multiprocessing computer system connprising : 

a first local domain comprising a first processor 
and a first memory, wherein said first processor 
is configured to access a memory location 
within said first memory in a read/write mode 
via a first address within a local address parti- 
tion of an address space employed by said 
multiprocessing computer system; 

a second local domain comprising a second 
processor, wherein said secorxl processor is 
configured to access said memory location in a 
read-only mode via a second address within a 
read-only address partition of said address 
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a first repeater coupled between said first local 
domain and said second local domain, said first 
repeater configured to transmit transactions 
between said first local domain and said sec- 
ond local domain if said transactions have a 
corresponding address within said read-only 
address partition, said first repeater further 
configured to inhibit transmission of transac- 
tions between said first local domain and said 
second local domain if said transactions teve 
said corresponding address within said local 
address partition, wherry said second proc- 
essor is prevented from updating said first 
memory location. 

2. Th^= -miltiprocessing computer system as recited in 
cle 1 wherein said second local domain com- 
prise." a second repeater coupled between said 
second processor and sakl first repeater, said sec- 
ond repeater configured to inhibit a broadcast of a 
particular transaction having a particular address 
within a global address partition which is a 
read/write address partition. 

3. The multiprocessing computer system as recited in 
claim 2 wherein said second local domain further 
comprises a second memory. 

4. The multiprocessing computer system as recited in 
claim 3 wherein, during use. said first memory 
stores a first operating system kernel and said sec- 
ond memory stores a second operating system ker- 
nel. 

5. The multiprocessing computer system as recited in 
claim 4 wheren said first and second operating sys- 
tem kemels operate independent of one another. 

6. The multiprocessing computer system as recited in 
claim 5 wheran said first and second operating sys- 
tem kernels communicate via said read-only 
address partition. 



processor and saki first repeater, a third processes, 
and a fourth repeater coupled between said third 
processor and said first repeater. 

5 10, The multiprocessing computer system as recited in 
daim 9 wherein said third repeater is configured to 
transmit a first transaction initiated by said first 
processor to said first repeater. 

10 1 1 . The multiprocessing computer system as redted in 
daim 10 virtierein said first repeater routes said first 
transaction to said fourth repeater regardless of 
which address partition contains a first correspond- 
ing address corresponding to said f irst transaction. 

15 . 

12. The multiprocessing computer system as recited in 
daim 1 1 wherein said fourth repeater transmits said 
first transaction to said third processor, whereby 
said third processor partcipates in said first trans- 

20 action. 

13. The multiprocessing computer system as recited in 
daim 12 wherein said third processor participates 
in said first transaction in order to maintain coher- 
es ency for said first con-esponding address. 

14. A method for operating a multiprocessing computer 
system in a protected mode, comprising: 
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7. The multiprocessing computer system as redted in 
claim 6 wherein said memory location stores at 
least one status bit. and wherein said second proc- 
essor polls said status bit using said read-only 
address partition. 

8 The multiprocessing computer system as redted in 
claim 7 wherein said first processor updates said 
status bit using said local address partition. 

9 The multiprocessing computer system as redted in 
daim 1 wherein said first local domain further com- 
prises a third repeater coupled between said first 
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accessing a memory location within a first local 
domain by a first processor within said first 
local domain, said first processa using a first 
address induded within a local address parti- 
tion having a read/write mode for said memory 
location: 

accessing said memory location by a second 
processor within a second local domain, said 
second processor using a second address 
induded within a global read-only address par- 
tition having a read-only mode for said memory 
location: and 

preventing an access by said second proces- 
sor using a third address included within a glo- 
bal read/write address partition by preventing 
transmittal of said access form said second 
local domain to said first local domain, whereby 
said second processor is prevented from 
updating said memory location. 

15. The method as redted in claim 14 further compris- 
ing running a first operating system kernel within 
said f iret local domain and a second operating sys- 
tem kernel within said second local domain. 

16 The method as recited in claim 15 wherein saidfirst 
" operating system kernel and said second operating 
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system kernel operate independently. 

1 7. The method as redted in daim 1 6 wherein said first 
operating system kernel and said second operation 
system kernel communicate via said reaj-only s 
address partition. 

18. A multiprocessing computer system conrprising: 
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a first local domain including a first processor, a 
second processor, and a first memory; 

a second local domain including a third proces- 
sor and a second meniory; and 

a first repeater coupled between said first local 
domain and said second-local domain, said 
first repeater configured to receive a first trans- 
action from said first processor, wherein said 
first repeater is configured to transmit said first 
transaction to said third processor if said first 
transaction is a global transaction, and wher»n 
said first repeater is configured to inhibit trans- 
mission of said first transaction to said third 
processor if said first transaction is a local 
transaction. 

19. The multiprocessing computer system as recited in 
claim 18 wherein said first repeater is configured to 
transmit said first transaction to said second proc- 
essor, regardless of a global/local nature of said 
first transaction. 

20. The multiprocessing computer system as recited in 
claim 18 wherein, if said first transaction is local, 
said first transaction accesses a memory location 
within said first memory. 

21 . The multiprocessing conrputer system as recited in 
claim 18 wherein said first memory and said sec- 
ond memory are encompassed within an address 
space employed by said multiprocessing computer 
system. 

22. The multiprocessing computer system as recited m 
claim 21 wherein said address space comprises 
murtiple address partitions, and wherein said 
repeater is configured to determine if a transaction 
is local or global based upon an address of said 
trarsaction being in one of said multiple address 
partitions. 

23. The multiprocessing computer system as recited in 
claim 22 wherein a first one of said multiple address 
partitions is a global address partition, and wherein 
transactions having addresses within said fffst one 
of said multiple address partitions are global trans- 
actions. 



24. The multiprocessing computer system as recited in 
daim 23 wherein a second one of said multiple 
address partitions is a local address partition, and 
wherein transactions having addresses within said 
second one of said multiple address partitions are 
local transactions. 

25. The multiprocessing computer system as recited in 
daim 24 wherein a third one of said murtiple 
address partitions is a remote read address parti- 
tion, and wherein transactions having addresses 
witiun said third one of said murtiple address parti- 
tions are read stream transactions. 

15 26. The murtiprocessing computer system as recited in 
daim 25 wherein a fourth one of said murtiple 
address partitions is a remote read/write address 
partition, and wherein trar^ctions having 
addresses witiiin said fourth one of said murtiple 
address partitions are either read stream or wrrte 
stream transactions. 



25 



27. The murtiprocessing computer system as recrted in 
daim 18 wherein said first local domain further 
indudes a second repeater coupled between said 
first processor and said first repeater, and wherein 
said first local domain further indudes a third 
repeater coupled between said second processor 
and said first repeater. 
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28. The multiprocessing computer system as recrted in 
daim 27 wherein said first memory comprises a 
first portion coupled to said second repeater and a 
second portion coupled to said third repeater. 

29. The multiprocessing computer system as recrted in 
daim 28 wherein said first repeater is configured to 
receive said first transaction from said second 
repeater and to transmit said first transaction to 
said ttiird repeater. 

30. The murtiprocessing computer system as recrted in 
daim 29 wherein said third repeater is configured to 
transmrt said first transaction to said second proc- 

45 essor. 

31 . A method for operating a murtiprocessing computer 
system comprising: 

receiving a first transaction in a first repeater 
from a first processor within a first local domain 
of said multiprocessing computer system; 

ti'ansmrtting said first fransaction from said first 
repeater to a secorxj processor wrthin a second 
local domain of sakj murtiprocessing computer 
system if said first transaction is a global trans- 
action; and 
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inhibiting transmission of said first transaction 
to said secortd processor if said first transac* 
tion is a local transactioa 

32. The method as recited in claim 31 further compris- 
ing transmitting sakl first transaction to a third proc- 
essor within said first local domain. 

33. The method as recited in daim 32 wheren said first 
domain comprises a second repeater coupled 
t>etween said first processor and said f ir^ repeater 
and a third repeater coupled between said third 
processor and said first repeater, and wherein said 
first transaction is routed from said first processor 
through said second repeater to said first repeater, 
and wherein said first transaction is further routed 
from said first repeater through said third repeater 
to said third processor. 

34. A multiprocessing computer-system comprising; 

a first repeater: 

a first local domain comprising a second 
repeater coupled to said first repeater and a 
third repeater coupled to 

said first repeater, wherein said second 
repeater is coupled to a first plurality of proces- 
sors and a first memory, and wherein said third 
repeater is coupled to a second plurality of 
processors ard a second memory; and 

a second local domain comprising a fourth 
repeater coupled to said first repeater and a 
fifth repeater coupled to said first repeater, 
wherein said fourth repeater is coupled to a 
third plurality of processors and a third mem- 
ory, and wherein said fifth repeater is coupled 
to a fourth plurality of processors and a fourth 
memory; 

wherein said first repeater is configured to 
transmit a first transaction from said first local 
domain to said second local domain if said first 
transaction is global, and wherein said first 
repeater is configured to route said first trans- 
action to said second and third repeaters within 
said first local domaia 

35. The multiprocessing computer system as recited in 
claim 34 wherein sale first memory, said second 
memory, said third memory, and said fourth mem- 
ory are encompassed within an address space 
employed by said multiprocessing computer sys- 
tem. 

36. The multiprocessing computer system as recited in 



daim 35 wherein said address space comprises 
multiple address partitions induding a local 
attoress partition used for local transactions and a 
glotial address partition used icr global transac- 
5 tions. 

37. The multiprocessing corrputer sy^em as recited in 
daim 36 wherein said first repeater is configured to 
differentiate local and global transactions via which 
w one of said multiple address partitions contains an 
address of the transactions. 
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